MSC: a metagenomic sequence classification algorithm.
Identifieur interne : 000492 ( Main/Exploration ); précédent : 000491; suivant : 000493MSC: a metagenomic sequence classification algorithm.
Auteurs : Subrata Saha [États-Unis] ; Jethro Johnson [États-Unis] ; Soumitra Pal [États-Unis] ; George M. Weinstock [États-Unis] ; Sanguthevar Rajasekaran [États-Unis]Source :
- Bioinformatics (Oxford, England) [ 1367-4811 ] ; 2019.
Abstract
Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences.
DOI: 10.1093/bioinformatics/bty1071
PubMed: 30649204
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 000670
- to stream PubMed, to step Curation: 000670
- to stream PubMed, to step Checkpoint: 000488
- to stream Ncbi, to step Merge: 002081
- to stream Ncbi, to step Curation: 002081
- to stream Ncbi, to step Checkpoint: 002081
- to stream Main, to step Merge: 000495
- to stream Main, to step Curation: 000492
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">MSC: a metagenomic sequence classification algorithm.</title>
<author><name sortKey="Saha, Subrata" sort="Saha, Subrata" uniqKey="Saha S" first="Subrata" last="Saha">Subrata Saha</name>
<affiliation wicri:level="2"><nlm:affiliation>Healthcare and Life Sciences Division, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Healthcare and Life Sciences Division, IBM Thomas J. Watson Research Center, Yorktown Heights, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Johnson, Jethro" sort="Johnson, Jethro" uniqKey="Johnson J" first="Jethro" last="Johnson">Jethro Johnson</name>
<affiliation wicri:level="2"><nlm:affiliation>The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The Jackson Laboratory for Genomic Medicine, Farmington, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<affiliation wicri:level="2"><nlm:affiliation>National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD</wicri:regionArea>
<placeName><region type="state">Maryland</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Weinstock, George M" sort="Weinstock, George M" uniqKey="Weinstock G" first="George M" last="Weinstock">George M. Weinstock</name>
<affiliation wicri:level="2"><nlm:affiliation>The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The Jackson Laboratory for Genomic Medicine, Farmington, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:affiliation>Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Science and Engineering Department, University of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2019">2019</date>
<idno type="RBID">pubmed:30649204</idno>
<idno type="pmid">30649204</idno>
<idno type="doi">10.1093/bioinformatics/bty1071</idno>
<idno type="wicri:Area/PubMed/Corpus">000670</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000670</idno>
<idno type="wicri:Area/PubMed/Curation">000670</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000670</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000488</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000488</idno>
<idno type="wicri:Area/Ncbi/Merge">002081</idno>
<idno type="wicri:Area/Ncbi/Curation">002081</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">002081</idno>
<idno type="wicri:Area/Main/Merge">000495</idno>
<idno type="wicri:Area/Main/Curation">000492</idno>
<idno type="wicri:Area/Main/Exploration">000492</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">MSC: a metagenomic sequence classification algorithm.</title>
<author><name sortKey="Saha, Subrata" sort="Saha, Subrata" uniqKey="Saha S" first="Subrata" last="Saha">Subrata Saha</name>
<affiliation wicri:level="2"><nlm:affiliation>Healthcare and Life Sciences Division, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Healthcare and Life Sciences Division, IBM Thomas J. Watson Research Center, Yorktown Heights, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Johnson, Jethro" sort="Johnson, Jethro" uniqKey="Johnson J" first="Jethro" last="Johnson">Jethro Johnson</name>
<affiliation wicri:level="2"><nlm:affiliation>The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The Jackson Laboratory for Genomic Medicine, Farmington, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<affiliation wicri:level="2"><nlm:affiliation>National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD</wicri:regionArea>
<placeName><region type="state">Maryland</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Weinstock, George M" sort="Weinstock, George M" uniqKey="Weinstock G" first="George M" last="Weinstock">George M. Weinstock</name>
<affiliation wicri:level="2"><nlm:affiliation>The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The Jackson Laboratory for Genomic Medicine, Farmington, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:affiliation>Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Science and Engineering Department, University of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2019" type="published">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Connecticut</li>
<li>Maryland</li>
<li>État de New York</li>
</region>
</list>
<tree><country name="États-Unis"><region name="État de New York"><name sortKey="Saha, Subrata" sort="Saha, Subrata" uniqKey="Saha S" first="Subrata" last="Saha">Subrata Saha</name>
</region>
<name sortKey="Johnson, Jethro" sort="Johnson, Jethro" uniqKey="Johnson J" first="Jethro" last="Johnson">Jethro Johnson</name>
<name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<name sortKey="Weinstock, George M" sort="Weinstock, George M" uniqKey="Weinstock G" first="George M" last="Weinstock">George M. Weinstock</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000492 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000492 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= pubmed:30649204 |texte= MSC: a metagenomic sequence classification algorithm. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:30649204" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |